首页> 外文OA文献 >The IBM 2016 Speaker Recognition System
【2h】

The IBM 2016 Speaker Recognition System

机译:IBm 2016演讲者识别系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper we describe the recent advancements made in the IBM i-vectorspeaker recognition system for conversational speech. In particular, weidentify key techniques that contribute to significant improvements inperformance of our system, and quantify their contributions. The techniquesinclude: 1) a nearest-neighbor discriminant analysis (NDA) approach that isformulated to alleviate some of the limitations associated with theconventional linear discriminant analysis (LDA) that assumes Gaussianclass-conditional distributions, 2) the application of speaker- andchannel-adapted features, which are derived from an automatic speechrecognition (ASR) system, for speaker recognition, and 3) the use of a deepneural network (DNN) acoustic model with a large number of output units (~10ksenones) to compute the frame-level soft alignments required in the i-vectorestimation process. We evaluate these techniques on the NIST 2010 speakerrecognition evaluation (SRE) extended core conditions involving telephone andmicrophone trials. Experimental results indicate that: 1) the NDA is moreeffective (up to 35% relative improvement in terms of EER) than the traditionalparametric LDA for speaker recognition, 2) when compared to raw acousticfeatures (e.g., MFCCs), the ASR speaker-adapted features provide gains inspeaker recognition performance, and 3) increasing the number of output unitsin the DNN acoustic model (i.e., increasing the senone set size from 2k to 10k)provides consistent improvements in performance (for example from 37% to 57%relative EER gains over our baseline GMM i-vector system). To our knowledge,results reported in this paper represent the best performances published todate on the NIST SRE 2010 extended core tasks.
机译:在本文中,我们描述了IBM i-vectorspeaker识别系统在会话语音方面的最新进展。特别是,我们确定了有助于显着改善系统性能的关键技术,并量化了它们的作用。这些技术包括:1)拟定为减轻与假设高斯类条件分布的常规线性判别分析(LDA)相关的某些局限性的最近邻判别分析(NDA)方法; 2)应用说话人和通道自适应特征,它们来自自动语音识别(ASR)系统,用于说话人识别; 3)使用具有大量输出单元(〜10ksenones)的深度神经网络(DNN)声学模型来计算帧级软对齐i向量估算过程中需要。我们在NIST 2010说话者识别评估(SRE)扩展的涉及电话和麦克风试验的核心条件下评估了这些技术。实验结果表明:1)与原始的声学功能(例如MFCC)相比,与ASR说话人自适应的功能相比,NDA比传统的参量LDA更有效(相对于EER而言,相对改善高达35%); 2)提供说话者识别性能的增益,并且3)在DNN声学模型中增加输出单元的数量(即将senone设置的大小从2k增加到10k)可提供性能的持续改进(例如,相对EER增益从37%提高到57%)我们的基准GMM i-vector系统)。据我们所知,本文报告的结果代表了迄今为止在NIST SRE 2010扩展核心任务上发布的最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号